Statistical Assumption 


Course Goals 


We attempt to accomplish two goals. 


e We explain the nature of the major “assumptions” underlying statistical tests. In 
the brief discussion here, we mention only those assumptions that we consider 
common enough and important enough to deserve comment, but we manage to 


include most of the major assumptions underlying frequently used statistical 
techniques. 


e We provide some guidelines for examining the assumptions of statistical tests. 


Introduction 
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Why we need Assumption? 
e A technique is sometimes developed based on certain conditions 
e The Pythagorean formula only applies to flat area 


Statistics work data at the sample level 
e Population has unlimited characteristics 
e Statistics works on samples with certain characteristics 


e The characteristics taken usually are generally common characteristics 
e For example, the conditions in the sample are assumed to met with condition the population 
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Introduction 


There are different perspectives regarding assumptions 
e The definition regarding statistical assumptions is solid. 
e Statistical tests are developed based on several assumption 


e However, with regard to whether or not it is statistical assumption necessary to 
test is still varies 


e Statistics always restricted to condition 


e “given..., then...” 
e “Let's say that...., then” 


Robustness 


Robustness of Statistical Test 
CC EEE 


e The ability of a test to survive violations of its assumptions without the validity of 
the test being seriously compromised is called the robustness of the test. A 
robust test continues to perform well when its assumptions are violated. 


Example 


e the t-test is robust against non-normality; this test is in doubt only when there 
can be serious outliers (Snijders, 2011) 


e On the other hand, the t-test is so robust against non-normality that there is 
nearly no need to use the Wilcoxon test in comparing expectations. (Rasch, 207) 


e Empirical evidence to the robustness of the analysis of variance (ANOVA) 
concerning violation of the normality assumption is. (Schmider, 2010) 


Different degree of Robustness 
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e A test may be robust with regard to some assumptions but not others, and most 
tests may be particularly susceptible to bias if certain assumptions independence 
of observations, are violated. 


e Regarding other assumptions, the exact consequences of violation with certain 
tests may be unknown. 
e In some cases, the issue may be one in which “minor” violations cause no major problems 
but “severe” violations bias the results. 
e The question of where to draw the line between minor and severe may be in dispute. 


Robustness of Statistical Test: Conditional 
SE 


Some robustness of statistical test againts violation of assumption can be achieved 
using appropriate design 


e In general, with violations of homogeneity the analysis is considered robust if we 
have equal sized groups. With violations of normality, continuing with the ANOVA 
generally creates no problem if we have a large sample size. (Jose, 2018) 


e The equal sample size make the Anova test very robust for not too serious’ 
violation of assumption of normality. Moreover, Tukey Post Hoc test is 
appropriate for this equal size case, since it would be robust for violation of the … 
(Cheng, 2004) 


Robustness of Statistical Test: Conditional 
SE 


Some robustness of statistical test againts violation of assumption can be achieved 
using appropriate design 


e Box (1954a) reported that the ANOVA F test is robust with respect to violation of 
the homogeneity of ... equal number of observations in each of the treatment 
levels ... 


e When sample sizes are large (i.e., when both groups have >25 participants each) 
and are approximately equal in size, the robustness of this test to violation of the 
assumption of normality is improved (Diekhoff, 1992 


Parametric vs. Non Parametric 


#1 Types of Data 
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Types of Data 


e Ordinal > Arithmetical operator cannot be applied 
e Interval/Ratio > Arithmetical operator can applied 


Parametric vs Non Parametric 


e Parametric statistical test using all arithmetical operators 
e Non Parametric statistical test using simple arithmetical operators 


#2 Assumption 
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Assumption 


e Normally distributed 
e Large sample size etc. 


Parametric vs Non Parametric 


e Parametric using several assumptions 
e Non Parametric using free assumptions 


#3 Tradition 
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Parametric vs Non Parametric 


e Tradition 

e Field of study 
e Discipline 

e University 

e Research grant 
e Publishers 


Several Assumptions 


Assumption, Requirement, Goodness of Fit 





Assumptions 
e Something that is considered true even without proof 
e Assumptions do not require proof 


Requirement 
e A condition that must be met in order to do something 
e Prerequisites must be met before doing anything 


Model Fit 
e The measure that something has met the expected criteria 


Assumption of t-test #1 


e Bivariate independent variable (A, B groups) 
e Continuous dependent variable 


e Each observation of the dependent variable is independent of the other 
observations of the dependent variable. 


e Dependent variable has a normal distribution, with the same variance, o2, in each 
group 


Assumption of t-test #2 
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Independence 

e The observations in one sample are independent of the observations in the other sample. 
Normality of distributed 

e Both samples are approximately normally distributed. 

Homogeneity of Variances 

e Both samples have approximately the same variance. 

Random Sampling 


e Both samples were obtained using a random sampling method. 


Se: =Ù. 81 a 
CS AC OT _ Kye XP = 
Notmal Distribution on 
ET — ASUS 
Fee 
Gun EHR PROMS ë Dis 
Ferrer Affine e 
AT WW Ge Aer XEF XM 


kk +01 ABM. KS =lo.G P Loor (oa 


Norg 
Ln N ? 
Or Mad gem /A\ (CH 


ES 


AK 


e How to assess the normality of the data? 


uld not cause major 


1. With large enough sample sizes (n >30) the violation of the normality assumption sho | 
the data and use parametric 


EENG (central limit theorem). This implies that we can ignore the distribution of 
ests. 


2. However, to be consistent, we can use Shapiro-Wilk’s significance test comparing the sample distribution to a 
normal one in order to ascertain whether data show or not a serious deviation from normality. 


e How to assess the equality of variances? 


e The standard Student's t-test (comparing two independent samples) and the ANOVA test (comparing multiple 
Samples) assume also that the samples to be compared have equal variances. 


e If the samples, being compared, follow normal distribution, then it’s possible to use: 
e F-test to compare the variances of two samples 


e Bartlett’s Test or Levene’s Test to compare the variances of multiple samples. 


